This notebook analyzes tweets by wizkidfc over a 1 week period, the dataset contains 2076 tweets which is an excel file with two sheets, the first is the tweets and info about each tweet, while the second contains info about the tweep for each tweet.

We begin by importing the necessary libraries and packages

In [35]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import plotly.express as px
In [36]:
import plotly.io as pio
pio.renderers.default = "notebook+pdf+jupyterlab"
In [37]:
from wordcloud import WordCloud
In [38]:
from nltk.stem import WordNetLemmatizer
In [39]:
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import confusion_matrix, classification_report

In the cell below, the data in the tweets sheet is imported and assigned to variable wizkid, the cells following this one are just helping to acquire more info about the features of the dataset and help us get more ideas about the dataset in general

In [40]:
wizkid = pd.read_excel('wizkid_tweets.xlsx', sheet_name='tweets')
wizkid.head()
Out[40]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type URLs Hashtags Mentions Media Type Media URLs Unnamed: 16 Unnamed: 17 Unnamed: 18
0 1510157522514153474 RT @Kwm913 : WIZKID LE 16 SEPTEMBRE A PANAAAAM... Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet NaN 0 0 NaN NaN NaN NaN NaN
1 1510157521859878916 RT @Abbye_edi__ : 30bgs have been cooking and ... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet NaN 0 0 NaN NaN NaN NaN NaN
2 1510157521377546242 RT @Cruisewithmee : There’s a reason the indus... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet NaN 0 0 NaN NaN NaN NaN NaN
3 1510157516432420865 @heisizumichaels Davido, Wizkid, Burna boy and... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply NaN 0 1 NaN NaN NaN NaN NaN
4 1510157513991278601 RT @_DianaLuv : Nicki minaj Body, Wizkid menta... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet NaN 0 0 photo https://pbs.twimg.com/media/FPPSC0iWUAUGN0Y.jpg NaN NaN NaN
In [41]:
wizkid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2076 entries, 0 to 2075
Data columns (total 19 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Tweet Id     2076 non-null   int64 
 1   Text         2076 non-null   object
 2   Name         2076 non-null   object
 3   Screen Name  2076 non-null   object
 4   UTC          2076 non-null   object
 5   Created At   2076 non-null   object
 6   Favorites    2076 non-null   int64 
 7   Retweets     2076 non-null   int64 
 8   Language     2076 non-null   object
 9   Client       2076 non-null   object
 10  Tweet Type   2076 non-null   object
 11  URLs         429 non-null    object
 12  Hashtags     2076 non-null   int64 
 13  Mentions     2076 non-null   int64 
 14  Media Type   637 non-null    object
 15  Media URLs   637 non-null    object
 16  Unnamed: 16  51 non-null     object
 17  Unnamed: 17  35 non-null     object
 18  Unnamed: 18  22 non-null     object
dtypes: int64(5), object(14)
memory usage: 308.3+ KB
In [42]:
wizkid.isnull().sum()
Out[42]:
Tweet Id          0
Text              0
Name              0
Screen Name       0
UTC               0
Created At        0
Favorites         0
Retweets          0
Language          0
Client            0
Tweet Type        0
URLs           1647
Hashtags          0
Mentions          0
Media Type     1439
Media URLs     1439
Unnamed: 16    2025
Unnamed: 17    2041
Unnamed: 18    2054
dtype: int64
In [43]:
wizkid.dtypes
Out[43]:
Tweet Id        int64
Text           object
Name           object
Screen Name    object
UTC            object
Created At     object
Favorites       int64
Retweets        int64
Language       object
Client         object
Tweet Type     object
URLs           object
Hashtags        int64
Mentions        int64
Media Type     object
Media URLs     object
Unnamed: 16    object
Unnamed: 17    object
Unnamed: 18    object
dtype: object

The unnamed features are not required, so they are removed.

In [44]:
wizkid.drop(['Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18'], axis=1, inplace=True)
In [45]:
wizkid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2076 entries, 0 to 2075
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Tweet Id     2076 non-null   int64 
 1   Text         2076 non-null   object
 2   Name         2076 non-null   object
 3   Screen Name  2076 non-null   object
 4   UTC          2076 non-null   object
 5   Created At   2076 non-null   object
 6   Favorites    2076 non-null   int64 
 7   Retweets     2076 non-null   int64 
 8   Language     2076 non-null   object
 9   Client       2076 non-null   object
 10  Tweet Type   2076 non-null   object
 11  URLs         429 non-null    object
 12  Hashtags     2076 non-null   int64 
 13  Mentions     2076 non-null   int64 
 14  Media Type   637 non-null    object
 15  Media URLs   637 non-null    object
dtypes: int64(5), object(11)
memory usage: 259.6+ KB

We're not really going to be working with pictures or videos, so its better to just drop the media url and urls column.

In [46]:
wizkid.drop(['Media URLs', 'URLs'], axis=1, inplace=True)
In [47]:
wizkid.head()
Out[47]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type
0 1510157522514153474 RT @Kwm913 : WIZKID LE 16 SEPTEMBRE A PANAAAAM... Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN
1 1510157521859878916 RT @Abbye_edi__ : 30bgs have been cooking and ... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN
2 1510157521377546242 RT @Cruisewithmee : There’s a reason the indus... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN
3 1510157516432420865 @heisizumichaels Davido, Wizkid, Burna boy and... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN
4 1510157513991278601 RT @_DianaLuv : Nicki minaj Body, Wizkid menta... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo

To check for the total tweets in that period

In [48]:
print('Total tweets this period: ', len(wizkid))
Total tweets this period:  2076

To begin our Exploratory Data Analysis(EDA). We're going to first check for the most retweeted tweets, in the period.

In [49]:
retweet_df = wizkid.sort_values(by='Retweets', ascending=False).reset_index(drop=True)
print("The top 5 most retweeted tweets", retweet_df.head(5)['Name'], retweet_df['Text'].head(5), retweet_df['Retweets'].head(5))
The top 5 most retweeted tweets 0      olyria roy ☮️
1    Beautiful Synto
2           LERE BOY
3             Dapsy𓃵
4           B L G 🤴💫
Name: Name, dtype: object 0    Peaceful ramadan, no davido, graduate Rihanna,...
1    Forget Rihanna Beyonce and Wizkid, How deep ca...
2    tatibiji sef dey drag artiste 😂 2baba probably...
3    I wonder why it is hard for some people to rea...
4    Ease your mind and Blessed arguably has to be ...
Name: Text, dtype: object 0    74
1    34
2    32
3    29
4    18
Name: Retweets, dtype: int64

It's better to create a dataframe, so it can be visualized more easily

In [50]:
d = {'Name': retweet_df['Name'].head(5), 'Tweets': retweet_df['Text'].head(5), 'Retweets': retweet_df['Retweets'].head(5)}
retweet_df = pd.DataFrame(data=d)

retweet_df
Out[50]:
Name Tweets Retweets
0 olyria roy ☮️ Peaceful ramadan, no davido, graduate Rihanna,... 74
1 Beautiful Synto Forget Rihanna Beyonce and Wizkid, How deep ca... 34
2 LERE BOY tatibiji sef dey drag artiste 😂 2baba probably... 32
3 Dapsy𓃵 I wonder why it is hard for some people to rea... 29
4 B L G 🤴💫 Ease your mind and Blessed arguably has to be ... 18

Yeah, it's better this way. It's better when we visualize it as a chart though

In [51]:
fig = px.bar(retweet_df, x='Name', y='Retweets', title='Plot of tweep with respective number of retweets in the period')
fig.show()
In [52]:
retweet_df.query("Name == 'B L G 🤴💫'")['Tweets']
Out[52]:
4    Ease your mind and Blessed arguably has to be ...
Name: Tweets, dtype: object

Going further, another thing I'd like to check is the popularity of the different types of tweet clients

The cell below checks for the value count of each tweet client. From the output, we can see that Android is the most popular choice of smartphone among wizkid fans, It's possible you'd have thought it was going to be iphone, but android wins again .

One thing that surprises me most in the types of tweet client is the wizkid retweet bot, the bot helps to retweet posts, shout-out to whoever created this bot, you're also using your skills to help the fandom.

In [53]:
tweet_client2 = wizkid['Client'].value_counts()
tweet_client2.head()
Out[53]:
<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>                                    1135
<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>                                       891
<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>                                                   37
<a href="https://help.twitter.com/en/using-twitter/how-to-tweet#source-labels" rel="nofollow">wizkid retweet bot</a>       6
<a href="https://ifttt.com" rel="nofollow">IFTTT</a>                                                                       2
Name: Client, dtype: int64
In [54]:
d = {'Client':['Twitter for android', 'Twitter for iPhone', 'Twitter web app', 'wizkid retweet bot', 'IFTTT'], 'Counts': [1135, 891, 37, 6, 2]}
client_df = pd.DataFrame(data=d)
client_df.head()
Out[54]:
Client Counts
0 Twitter for android 1135
1 Twitter for iPhone 891
2 Twitter web app 37
3 wizkid retweet bot 6
4 IFTTT 2

To visualize it better, I'm gonna be using a bar chart once again

In [55]:
fig = px.bar(client_df, x='Client', y='Counts', title='Plot of tweet client with the count')
fig.show()
In [56]:
wizkid.head()
Out[56]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type
0 1510157522514153474 RT @Kwm913 : WIZKID LE 16 SEPTEMBRE A PANAAAAM... Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN
1 1510157521859878916 RT @Abbye_edi__ : 30bgs have been cooking and ... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN
2 1510157521377546242 RT @Cruisewithmee : There’s a reason the indus... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN
3 1510157516432420865 @heisizumichaels Davido, Wizkid, Burna boy and... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN
4 1510157513991278601 RT @_DianaLuv : Nicki minaj Body, Wizkid menta... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo

The next thing, I'm going to be doing is analyze tweet sentiments.

Just a side note, i didn't know much about Natural Language Processing before starting this analysis, so I had to do some reading and also take a quick course on it, I'm still not conversant with it but I kind of know my way around it now.

For the sentiment analysis, I used textblob. Textblob is the python library for processing textual data. Textblob is a high level library built on top of NLTK library.

The function below helps to get the subjectivity and polarity of each tweet. Subjectivity here refers to tweets that generally refer to personal opinion, emotion or judgement whereas objective refers to factual information. Subjectivity is a float which lies in the range of [0,1]

Polarity is also a flot which lies in the range [-1,1] where 1 means positive satement and -1 means a negative statement.

In [57]:
from textblob import TextBlob
def get_subjectivity(text):
    return TextBlob(text).sentiment.subjectivity

def get_polarity(text):
    return TextBlob(text).sentiment.polarity
In [58]:
wizkid['subjectivity'] = wizkid['Text'].apply(get_subjectivity)
wizkid['polarity'] = wizkid['Text'].apply(get_polarity)
In [59]:
wizkid.head(5)
Out[59]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity
0 1510157522514153474 RT @Kwm913 : WIZKID LE 16 SEPTEMBRE A PANAAAAM... Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.000
1 1510157521859878916 RT @Abbye_edi__ : 30bgs have been cooking and ... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.350000 -0.175
2 1510157521377546242 RT @Cruisewithmee : There’s a reason the indus... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.000000 0.000
3 1510157516432420865 @heisizumichaels Davido, Wizkid, Burna boy and... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN 0.558333 0.450
4 1510157513991278601 RT @_DianaLuv : Nicki minaj Body, Wizkid menta... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000

After getting the polarity of each tweet, It makes more sense to categorize them as positive, negative or neutral.

Tweets with a score less than zero are negative, tweets with a score of zero are neutral while tweets with a score more than zero are positive tweets.

In [60]:
def getAnalysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'
In [61]:
wizkid['analysis'] = wizkid['polarity'].apply(getAnalysis)
In [62]:
wizkid.head(5)
Out[62]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis
0 1510157522514153474 RT @Kwm913 : WIZKID LE 16 SEPTEMBRE A PANAAAAM... Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.000 Neutral
1 1510157521859878916 RT @Abbye_edi__ : 30bgs have been cooking and ... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.350000 -0.175 Negative
2 1510157521377546242 RT @Cruisewithmee : There’s a reason the indus... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.000000 0.000 Neutral
3 1510157516432420865 @heisizumichaels Davido, Wizkid, Burna boy and... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN 0.558333 0.450 Positive
4 1510157513991278601 RT @_DianaLuv : Nicki minaj Body, Wizkid menta... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000 Neutral

The next cell checks for the value count of each tweet sentiment, tweets with positive sentiments came out on top. 😊🎉🎉 You're happy right? Well, I am too 😂. Let's just try to keep tweets with negative sentiments down 🤞🤞.

In [63]:
wizkid['analysis'].value_counts()
Out[63]:
Positive    1163
Neutral      577
Negative     336
Name: analysis, dtype: int64

As always, every analysis we do is better visualized with a chart, a bar chart is also the best for this.

In [64]:
plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
wizkid['analysis'].value_counts().plot(kind='bar')
plt.show()

The next analysis we are going to be doing is to plot a wordcloud which is based on the most popular words in the tweet column. Before we can plot the wordcloud though, we have to do some preprocessing.

The cell below convert all the tweets into lowercase letters.

In [65]:
wizkid['Text'] = wizkid['Text'].str.lower()
wizkid['Text'].tail()
Out[65]:
2071    rt @ms_fej : 2baba will still post wizkid when...
2072    rt @jameelasosexy : what wizkid &amp; buju bnx...
2073    rt @firstladyship : why do nigerian stans figh...
2074    rt @_asiwajulerry : you wonder why people like...
2075    rt @savvy_elijah : nobody:\ndavido patiently w...
Name: Text, dtype: object
In [66]:
stopwordlist = ['rt', 'a', 'about', 'above', 'after', 'again', 'ain', 'all', 'am', 'an', 'and','any','are', 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between','both', 'by', 'can', 'd', 'did', 'do', 'does', 'doing', 'down', 'during', 'each','few', 'for', 'from', 'further', 'had', 'has', 'have', 'having', 'he', 'her', 'here', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in', 'into','is', 'it', 'its', 'itself', 'just', 'll', 'm', 'ma', 'me', 'more', 'most','my', 'myself', 'now', 'o', 'of', 'on', 'once', 'only', 'or', 'other', 'our', 'ours','ourselves', 'out', 'own', 're','s', 'same', 'she', "shes", 'should', "shouldve",'so', 'some', 'such', 't', 'than', 'that', "thatll", 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'there', 'these', 'they', 'this', 'those', 'through', 'to', 'too','under', 'until', 'up', 've', 'very', 'was', 'we', 'were', 'what', 'when', 'where','which','while', 'who', 'whom', 'why', 'will', 'with', 'won', 'y', 'you', "youd","youll", "youre", "youve", 'your', 'yours', 'yourself', 'yourselves']
In [67]:
STOPWORDS = set(stopwordlist)
def cleaning_stopwords(text):
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])
wizkid['Text'] = wizkid['Text'].apply(lambda text: cleaning_stopwords(text))
wizkid['Text'].tail()
Out[67]:
2071    @ms_fej : 2baba still post wizkid wins sunday!...
2072    @jameelasosexy : wizkid &amp; buju bnxn said m...
2073    @firstladyship : nigerian stans fight over dav...
2074    @_asiwajulerry : wonder people like wizkid it’...
2075    @savvy_elijah : nobody: davido patiently waiti...
Name: Text, dtype: object

The punctuations are also cleaned, thereby reducing the unnecessary noise from the dataset.

In [68]:
import string
english_punctuations = string.punctuation
punctuations_list = english_punctuations
def cleaning_punctuations(text):
    translator = str.maketrans('', '', punctuations_list)
    return text.translate(translator)
wizkid['Text']= wizkid['Text'].apply(lambda x: cleaning_punctuations(x))
wizkid['Text'].tail()
Out[68]:
2071    msfej  2baba still post wizkid wins sunday one...
2072    jameelasosexy  wizkid amp buju bnxn said mood ...
2073    firstladyship  nigerian stans fight over david...
2074    asiwajulerry  wonder people like wizkid it’s t...
2075    savvyelijah  nobody davido patiently waiting w...
Name: Text, dtype: object

After that, we also remove the repeating characters from the words.

In [69]:
import re
def cleaning_repeating_char(text): 
    return re.sub(r'(.)1+', r'1', text)
wizkid['Text'] = wizkid['Text'].apply(lambda x: cleaning_repeating_char(x)) 
wizkid['Text'].tail()
Out[69]:
2071    msfej  2baba still post wizkid wins sunday one...
2072    jameelasosexy  wizkid amp buju bnxn said mood ...
2073    firstladyship  nigerian stans fight over david...
2074    asiwajulerry  wonder people like wizkid it’s t...
2075    savvyelijah  nobody davido patiently waiting w...
Name: Text, dtype: object

This next cell cleans URLs in all the tweets

In [70]:
def cleaning_URLs(data):
    return re.sub('((www.[^s]+)|(https?://[^s]+))',' ',data)
wizkid['Text'] = wizkid['Text'].apply(lambda x: cleaning_URLs(x))
wizkid['Text'].tail()
Out[70]:
2071    msfej  2baba still post wizkid wins sunday one...
2072    jameelasosexy  wizkid amp buju bnxn said mood ...
2073    firstladyship  nigerian stans fight over david...
2074    asiwajulerry  wonder people like wizkid it’s t...
2075    savvyelijah  nobody davido patiently waiting w...
Name: Text, dtype: object

This next cell cleans numbers from the tweets

In [71]:
def cleaning_numbers(data):
    return re.sub('[0-9]+', '', data)
wizkid['Text'] = wizkid['Text'].apply(lambda x: cleaning_numbers(x))
wizkid['Text'].tail()
Out[71]:
2071    msfej  baba still post wizkid wins sunday one ...
2072    jameelasosexy  wizkid amp buju bnxn said mood ...
2073    firstladyship  nigerian stans fight over david...
2074    asiwajulerry  wonder people like wizkid it’s t...
2075    savvyelijah  nobody davido patiently waiting w...
Name: Text, dtype: object

This next cell tokenizes the cleaned tweets, tokenization helps to separate the sentences into their individual words

In [72]:
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer('\w+|\$[\d\.]+|\S+')
wizkid['Text'] = wizkid.apply(lambda row: tokenizer.tokenize(row['Text']), axis=1)
wizkid['Text'].head()
Out[72]:
0    [kwm, wizkid, le, septembre, panaaaame, ouiiiiii]
1    [abbyeedi, bgs, cooking, throwing, yabs, burna...
2    [cruisewithmee, there, ’s, reason, industry, c...
3    [heisizumichaels, davido, wizkid, burna, boy, ...
4    [dianaluv, nicki, minaj, body, wizkid, mentali...
Name: Text, dtype: object

At last, then we performed stemming(reducing the words th their derived stems) and lemmatization(reducing the derived words to their root form known as lemma)

In [73]:
import nltk
st = nltk.PorterStemmer() 
def stemming_on_text(data): 
    text = [st.stem(word) for word in data] 
    return data 
wizkid['Text']= wizkid['Text'].apply(lambda x: stemming_on_text(x)) 
wizkid['Text'].head()
Out[73]:
0    [kwm, wizkid, le, septembre, panaaaame, ouiiiiii]
1    [abbyeedi, bgs, cooking, throwing, yabs, burna...
2    [cruisewithmee, there, ’s, reason, industry, c...
3    [heisizumichaels, davido, wizkid, burna, boy, ...
4    [dianaluv, nicki, minaj, body, wizkid, mentali...
Name: Text, dtype: object
In [74]:
lm = nltk.WordNetLemmatizer()
def lemmatizer_on_text(data):
    text = [lm.lemmatize(word) for word in data]
    return data
wizkid['Text'] = wizkid['Text'].apply(lambda x: lemmatizer_on_text(x))
wizkid['Text'].head()
Out[74]:
0    [kwm, wizkid, le, septembre, panaaaame, ouiiiiii]
1    [abbyeedi, bgs, cooking, throwing, yabs, burna...
2    [cruisewithmee, there, ’s, reason, industry, c...
3    [heisizumichaels, davido, wizkid, burna, boy, ...
4    [dianaluv, nicki, minaj, body, wizkid, mentali...
Name: Text, dtype: object

Phew 😪. At long last, the preprocessing part is done.

In [75]:
new_tweets = " "
for tweets in wizkid.Text:
    new_tweets += " ".join(tweets) + " "
In [76]:
#new_tweets

So, this is the part we've been preprocessing for 🎉🎉

In [77]:
wc = WordCloud(max_words=1000, width=1600, height=1000, collocations=False).generate(new_tweets)
plt.imshow(wc)
print(len(new_tweets))
256119

So, here is the wordcloud(I almost typed soundcloud 😂). As we can see from the chart, wizkid is the most popular word here, It has to be (all this is about him). The next is davido, this was kinda expected as they get compared in almost every tweet(which isn't at all necessary). I can see burna too, but boy is missing (another insight, people usually remove the boy and just call him burna). Next is the grammys, this has been the major subject in recent tweets. I feel bad he didn't win, we can go again next time 🤞.

I can also see love, wizkidfc is preaching love, that's nice, really nice, 😂

But then, I can see true, which is almost the same size as love, wizkidfc is preaching true love, that's lovely 😂 (I hope you understand what I did here).

So, here is a wordcloud of tweets with positive sentiment, it categorizes some words wrongly but we can still see the words which are categorized correctly.

In [78]:
positives = wizkid.query('analysis == "Positive"')
positive_tweets = " "
for positivess in positives.Text:
    positive_tweets += " ".join(positivess) + " "
wc = WordCloud(max_words=1000, width=1600, height=1000, collocations=False).generate(positive_tweets)
plt.imshow(wc)
print(len(positive_tweets))
152345

So, here is a wordcloud of tweets with negative sentiment, it categorizes some words wrongly but we can still see the words which are categorized correctly.

In [79]:
negatives = wizkid.query('analysis == "Negative"')
negative_tweets = " "
for negativess in negatives.Text:
    negative_tweets += " ".join(negativess) + " "
wc = WordCloud(max_words=1000, width=1600, height=1000, collocations=False).generate(negative_tweets)
plt.imshow(wc)
print(len(negative_tweets))
51746
In [80]:
wizkid.head(5)
Out[80]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis
0 1510157522514153474 [kwm, wizkid, le, septembre, panaaaame, ouiiiiii] Xaro Xaro_music 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.000 Neutral
1 1510157521859878916 [abbyeedi, bgs, cooking, throwing, yabs, burna... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.350000 -0.175 Negative
2 1510157521377546242 [cruisewithmee, there, ’s, reason, industry, c... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z Sat Apr 02 07:29:45 +0000 2022 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.000000 0.000 Neutral
3 1510157516432420865 [heisizumichaels, davido, wizkid, burna, boy, ... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN 0.558333 0.450 Positive
4 1510157513991278601 [dianaluv, nicki, minaj, body, wizkid, mentali... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z Sat Apr 02 07:29:43 +0000 2022 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000 Neutral

The next analysis I wanto to do is going to be based on the time of the tweets, there a lot of insights that can be gotten from here.

I'm going to convert the Created At column into a datettime column, so it can be easy to work with.

In [81]:
wizkid['Created At'] = pd.to_datetime(wizkid['Created At'], dayfirst=True)
In [82]:
wizkid.head(5)
Out[82]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis
0 1510157522514153474 [kwm, wizkid, le, septembre, panaaaame, ouiiiiii] Xaro Xaro_music 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.000 Neutral
1 1510157521859878916 [abbyeedi, bgs, cooking, throwing, yabs, burna... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.350000 -0.175 Negative
2 1510157521377546242 [cruisewithmee, there, ’s, reason, industry, c... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.000000 0.000 Neutral
3 1510157516432420865 [heisizumichaels, davido, wizkid, burna, boy, ... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z 2022-04-02 07:29:43+00:00 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN 0.558333 0.450 Positive
4 1510157513991278601 [dianaluv, nicki, minaj, body, wizkid, mentali... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z 2022-04-02 07:29:43+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000 Neutral

From the Created At column, i'll be creating other columns, which are the day column and the hour column

In [83]:
wizkid['day_of_tweet'] = wizkid['Created At'].dt.day
wizkid['hour_of_tweet'] = wizkid['Created At'].dt.hour
In [84]:
wizkid.head(5)
Out[84]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis day_of_tweet hour_of_tweet
0 1510157522514153474 [kwm, wizkid, le, septembre, panaaaame, ouiiiiii] Xaro Xaro_music 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.000 Neutral 2 7
1 1510157521859878916 [abbyeedi, bgs, cooking, throwing, yabs, burna... Godwinvictor5 vkidofficial 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.350000 -0.175 Negative 2 7
2 1510157521377546242 [cruisewithmee, there, ’s, reason, industry, c... timsonkim pablotimsonTN 2022-04-02T07:29:45.000Z 2022-04-02 07:29:45+00:00 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.000000 0.000 Neutral 2 7
3 1510157516432420865 [heisizumichaels, davido, wizkid, burna, boy, ... Sir Mondaylee💡 Mondaylee 2022-04-02T07:29:43.000Z 2022-04-02 07:29:43+00:00 0 0 en <a href="http://twitter.com/download/android" ... Reply 0 1 NaN 0.558333 0.450 Positive 2 7
4 1510157513991278601 [dianaluv, nicki, minaj, body, wizkid, mentali... 🇳🇬 OBA OF KOGI 👑 oba_tizer 2022-04-02T07:29:43.000Z 2022-04-02 07:29:43+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000 Neutral 2 7

After this, we'll be mapping each day with their respective day of the week

In [85]:
dw_mapping = {
    0: 'Monday',
    1: 'Tuesday',
    2: 'Wednesday',
    3: 'Thursday',
    4: 'Friday',
    5: 'Saturday',
    6: 'Sunday'
}

wizkid['day_of_week_name'] = wizkid['Created At'].dt.weekday.map(dw_mapping)

After checking the days of the week, I realized all the 2076 tweets were all done on Saturday 😂, that's a lot for one day really, too much.

Because of this, there's not much pattern that can be genrated as we don't have too many instances to work with.

When I have time for this again, I'm gonna try to scrape more tweets that will cover a longer duration and make more analysis from that, more insight can be generated then.

In [86]:
wizkid.tail(5)
Out[86]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis day_of_tweet hour_of_tweet day_of_week_name
2071 1510152398886604803 [msfej, baba, still, post, wizkid, wins, sunda... CHIDI fineboijoshh 2022-04-02T07:09:23.000Z 2022-04-02 07:09:23+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.544444 0.578125 Positive 2 7 Saturday
2072 1510152394696441863 [jameelasosexy, wizkid, amp, buju, bnxn, said,... MITCHELLS callmemichaels 2022-04-02T07:09:22.000Z 2022-04-02 07:09:22+00:00 0 0 et <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000000 Neutral 2 7 Saturday
2073 1510152394327437317 [firstladyship, nigerian, stans, fight, over, ... Philip Phillbetter_ 2022-04-02T07:09:22.000Z 2022-04-02 07:09:22+00:00 0 0 en <a href="http://twitter.com/download/iphone" r... Retweet 0 0 NaN 0.749796 -0.010102 Negative 2 7 Saturday
2074 1510152389847830532 [asiwajulerry, wonder, people, like, wizkid, i... Chemical Father👑 Victor_theplug 2022-04-02T07:09:21.000Z 2022-04-02 07:09:21+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.500000 0.500000 Positive 2 7 Saturday
2075 1510152388098859008 [savvyelijah, nobody, davido, patiently, waiti... kanmiey🥺♥️ kanmiey 2022-04-02T07:09:21.000Z 2022-04-02 07:09:21+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.000000 0.000000 Neutral 2 7 Saturday

The next thing that I want to look at is the most deserving of the wizkid fan badge in this period, I'll be checking the number of tweets they made as well as the sentiments.

In [87]:
users_df = wizkid['Name'].value_counts()
users_df.head(5)
Out[87]:
UptownGuy🦅🦅🦅    35
LIL DURK        35
Nehza🇳🇬🦅        33
BurnaBoyFan     29
timsonkim       24
Name: Name, dtype: int64

There are two people with 35 tweets on this Saturday alone, but uptownguy came at the top for a reason, so let's check him out

In [88]:
uptownguy = wizkid.query("Name== 'UptownGuy🦅🦅🦅'")
uptownguy.head(5)
Out[88]:
Tweet Id Text Name Screen Name UTC Created At Favorites Retweets Language Client Tweet Type Hashtags Mentions Media Type subjectivity polarity analysis day_of_tweet hour_of_tweet day_of_week_name
74 1510157377554817027 [antigravitylite, prolly, didn, ’t, know, song... UptownGuy🦅🦅🦅 UptownGuy8 2022-04-02T07:29:10.000Z 2022-04-02 07:29:10+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.600000 0.1000 Positive 2 7 Saturday
242 1510156991263608832 [beaustevenblog, building, strongest, wizkid, ... UptownGuy🦅🦅🦅 UptownGuy8 2022-04-02T07:27:38.000Z 2022-04-02 07:27:38+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.000000 0.0000 Neutral 2 7 Saturday
308 1510156858459369473 [vivianporsche, time, sunday, still, undispute... UptownGuy🦅🦅🦅 UptownGuy8 2022-04-02T07:27:07.000Z 2022-04-02 07:27:07+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.500000 -0.2000 Negative 2 7 Saturday
333 1510156802306027521 [starboyeurope, uk, apple, music, songs, chart... UptownGuy🦅🦅🦅 UptownGuy8 2022-04-02T07:26:53.000Z 2022-04-02 07:26:53+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 photo 0.250000 0.3125 Positive 2 7 Saturday
367 1510156750556737542 [mmafiaxco, wait, davido, really, unfollow, wi... UptownGuy🦅🦅🦅 UptownGuy8 2022-04-02T07:26:41.000Z 2022-04-02 07:26:41+00:00 0 0 en <a href="http://twitter.com/download/android" ... Retweet 0 0 NaN 0.133333 0.1000 Positive 2 7 Saturday
In [89]:
uptownguy_tweets = " "
for tweetss in uptownguy.Text:
    uptownguy_tweets += " ".join(tweetss)
In [90]:
wc = WordCloud(max_words=1000, width=1600, height=1000, collocations=False).generate(uptownguy_tweets)
plt.imshow(wc)
print(len(uptownguy_tweets))
3939

Wordcloud for uptownguys posts, (he also talks a lot about davido, we're gonna overlook that sha), apart from that, every other word seems positive

In [91]:
uptownguy['analysis'].value_counts()
Out[91]:
Positive    21
Neutral      9
Negative     5
Name: analysis, dtype: int64
In [92]:
plt.title('uptownguy Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
uptownguy['analysis'].value_counts().plot(kind='bar')
plt.show()

This is impressive really, uptownguy has 21 posts positive, which is about 60% of all his posts, that's cool uptownguy 🎉😂

In [93]:
users = pd.read_excel('wizkid_tweets.xlsx', sheet_name='users')
users.head(5)
Out[93]:
User Id Name Screen Name UTC Created At Followers Following Favorites Tweets Lists Bio Location URL Verified Default Profile
0 1363640774290735104 Xaro Xaro_music 2021-02-22T00:05:03.000Z Mon Feb 22 00:05:03 +0000 2021 1991 2002 44872 30711 0 Wizkid FC🦅❤🖤\n\nAfrobeat to the world🌏\n\nXvX💎... Benin-City, Nigeria NaN False True
1 865315093541683207 Godwinvictor5 vkidofficial 2017-05-18T21:16:09.000Z Thu May 18 21:16:09 +0000 2017 1184 1040 239413 116381 1 NaN Benin-City, Nigeria NaN False True
2 996567708450918401 timsonkim pablotimsonTN 2018-05-16T01:47:11.000Z Wed May 16 01:47:11 +0000 2018 206 859 4738 35129 2 Big Wiz 4 life|Blue 4 life💙 Lagos, Nigeria NaN False True
3 239060144 Sir Mondaylee💡 Mondaylee 2011-01-16T18:23:31.000Z Sun Jan 16 18:23:31 +0000 2011 58536 43126 335002 193920 17 I love dogs🐩||Personal Development Enthusiast💎... Exactly where God wants me https://twitter.com/search?q=from%3Amondaylee%... False False
4 1346711853666344960 🇳🇬 OBA OF KOGI 👑 oba_tizer 2021-01-06T06:55:27.000Z Wed Jan 06 06:55:27 +0000 2021 958 740 10970 13143 0 DIGITAL MARKETER || ACTIVIST || PEACE ADVOCATE... outside NaN False True

Let's just check for some info about uptown guy, to do this we're going to query the users sheet.

In [94]:
uptown_info = users.query('Name == "UptownGuy🦅🦅🦅"').head(1)
uptown_info
Out[94]:
User Id Name Screen Name UTC Created At Followers Following Favorites Tweets Lists Bio Location URL Verified Default Profile
74 1505566716335730688 UptownGuy🦅🦅🦅 UptownGuy8 2022-03-20T15:29:45.000Z Sun Mar 20 15:29:45 +0000 2022 122 152 7875 5734 0 God is the greatest. WiZkiD FC.. Gunner's For ... NaN NaN False True

The query shows he has 122 followers, he's a true wizkid fan and he's also an arsenal fan. I don't know if this is true generally, but I usually see a lot of wizkid fans that are also messi fans, let's run a quick check on that.

In [95]:
print(uptown_info.Bio)
74    God is the greatest. WiZkiD FC.. Gunner's For ...
Name: Bio, dtype: object
In [96]:
users['Bio'] = users['Bio'].apply(lambda text: cleaning_stopwords(text))
users['Bio'].tail()
Out[96]:
2071                         Free thinker. CHELSEA WIZKID
2072                                          I'm simple.
2073                                       Football Lover
2074                                        Coming soon🔥😎
2075    shawty widda big smile😌||Manchester United fan...
Name: Bio, dtype: object
In [97]:
users['Bio']= users['Bio'].apply(lambda x: cleaning_punctuations(x))
users['Bio'].tail()
Out[97]:
2071                          Free thinker CHELSEA WIZKID
2072                                            Im simple
2073                                       Football Lover
2074                                        Coming soon🔥😎
2075    shawty widda big smile😌Manchester United fan❤😭...
Name: Bio, dtype: object
In [98]:
users['Bio'] = users['Bio'].apply(lambda x: cleaning_repeating_char(x)) 
users['Bio'].tail()
Out[98]:
2071                          Free thinker CHELSEA WIZKID
2072                                            Im simple
2073                                       Football Lover
2074                                        Coming soon🔥😎
2075    shawty widda big smile😌Manchester United fan❤😭...
Name: Bio, dtype: object
In [99]:
users['Bio'] = users['Bio'].apply(lambda x: cleaning_URLs(x))
users['Bio'].tail()
Out[99]:
2071                          Free thinker CHELSEA WIZKID
2072                                            Im simple
2073                                       Football Lover
2074                                        Coming soon🔥😎
2075    shawty widda big smile😌Manchester United fan❤😭...
Name: Bio, dtype: object
In [100]:
users['Bio'] = users['Bio'].apply(lambda x: cleaning_numbers(x))
users['Bio'].tail()
Out[100]:
2071                          Free thinker CHELSEA WIZKID
2072                                            Im simple
2073                                       Football Lover
2074                                        Coming soon🔥😎
2075    shawty widda big smile😌Manchester United fan❤😭...
Name: Bio, dtype: object
In [101]:
users['Bio'] = users.apply(lambda row: tokenizer.tokenize(row['Bio']), axis=1)
users['Bio'].head()
Out[101]:
0    [Wizkid, FC, 🦅❤🖤, Afrobeat, world, 🌏, XvX, 💎💙💎...
1                                                [nan]
2                        [Big, Wiz, lifeBlue, life, 💙]
3    [I, love, dogs, 🐩Personal, Development, Enthus...
4    [DIGITAL, MARKETER, ACTIVIST, PEACE, ADVOCATE,...
Name: Bio, dtype: object
In [102]:
users['Bio']= users['Bio'].apply(lambda x: stemming_on_text(x)) 
users['Bio'].head()
Out[102]:
0    [Wizkid, FC, 🦅❤🖤, Afrobeat, world, 🌏, XvX, 💎💙💎...
1                                                [nan]
2                        [Big, Wiz, lifeBlue, life, 💙]
3    [I, love, dogs, 🐩Personal, Development, Enthus...
4    [DIGITAL, MARKETER, ACTIVIST, PEACE, ADVOCATE,...
Name: Bio, dtype: object
In [103]:
users['Bio'] = users['Bio'].apply(lambda x: lemmatizer_on_text(x))
users['Bio'].head()
Out[103]:
0    [Wizkid, FC, 🦅❤🖤, Afrobeat, world, 🌏, XvX, 💎💙💎...
1                                                [nan]
2                        [Big, Wiz, lifeBlue, life, 💙]
3    [I, love, dogs, 🐩Personal, Development, Enthus...
4    [DIGITAL, MARKETER, ACTIVIST, PEACE, ADVOCATE,...
Name: Bio, dtype: object
In [104]:
users_bio = " "
for bios in users.Bio:
    users_bio += " ".join(bios)
In [105]:
wc = WordCloud(max_words=1000, width=1600, height=1000, collocations=False).generate(users_bio)
plt.imshow(wc)
print(len(users_bio))
108521

Ahhhhh. My guess was wrong after all, wizkid has more mutual fans with ronaldo than with messi, it's still good to see that some part of the majority are messi fans.

Another insight here is that wizkid fans also like burna, that's cool, two grammy award winners 😂.

So, we have finally come to the end of this. It was worthwile and I made some pretty interesting discoveries. Cheers 🎉🎉